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Abstract 

Power analysis attacks against embedded secret key cryptosystems are widely stud- 
ied since the seminal paper of Paul Kocher, Joshua Ja. and Benjamin Jmi in 1998 
where has been introduced the powerful Differential Power Analysis. The strength of 
DPA is such that it became necessary to develop sound and efficient countcrmeasures. 
Nowadays embedded cryptographic primitives usually integrate one or several of these 
countcrmeasures (e.g. masking techniques, asynchronous designs, balanced dynamic 
dual-rail gates designs, noise adding, power consumption smoothing, etc. ...). This 
document presents a simple, yet interesting, countermeasure to DPA and HO-DPA at- 
tacks, called brutal countermeasure and new power analysis attacks using multi-linear 
approximations (MLPA attacks) based on very recent and still unpublished results of 
Tavernier et al.. 

Keywords: Power Analysis, MLPA, multi-linear cryptanalysis, Reed-Muller codes. 

1 Introduction 

Since the discovery of Differential Power Analysis (DPA) and High Order Differential 
Power Analysis (HO-DPA) attacks in 1998 ([l3]), the urge to develop resistant hardware 
implementations of symmetric ciphers has not ceased. The most popular countcrmeasures 
against these devastating attacks have two leaders : the transformed masking methods 
(initiated by M.-L. Akkar and C. Giraud in ^) and the duplication method (first proposed 
by L. Goubin and J. Patarin in [9|). When the duplication method of rank n has been 
shown to be vulnerable against a n-th order DPA [3], the masking method — which try 
to randomize the information leaked from the target device — gave better results in terms 
of resistance and performances. Thus after several propositions of enhanced DES imple- 
mentations [21 [3l [T] , the work of Jiqiang Lv and Yongfei in 2005 ([E]) finally proposed 



1 



an enhanced version of DES claimed to be secured against DPA and HO-DPA. To our 
knowledge, this countermeasure is still holding against those attacks. It uses the unique 
masking method of [3] where a new random mask is used for every encryption. Hence, 
before each encryption, a set of several custom SBoxes (dependent on the newly generated 
mask) is generated and stored in RAM. These techniques have the serious drawback of 
assuming the SBox generation being done in a secure way (i.e. no information should leak 
from this operations [3]) otherwise it is easy to see that the leaked information would lead 
to HO-DPAs, combining consumptions traces during the SBoxes generation and consump- 
tions traces during the actual encryption. From these considerations and the fact that 
such countermeasures implementations must be thoroughly considered, it is a matter of 
fact they eventually slowdown the designer of such embedded systems (smartcards, FPGA 
devices) and then the product's time to market. Moreover the resulting implementation, 
that integrates the additional computations (SBoxes generations), might show itself ineffi- 
cient in terms of execution time from the need of secure computations [3] . 
We present here a brutal way to counter-act Power Analysis attacks. The countermeasure 
advantages come from its simplicity and how it naturally disable relevant information leak- 
age, making it easier to design and implement without assuming that any part of the design 
is more secure than another. We will discuss its cost compare to Jiqiang Lv and Yongfei's 
bounds for DES unique masquing countermeasures [E], thus isolating some cases where 
the brutal countermeasure shows itself attractive to designers. Then we introduce a new 
set of power analysis attacks based on linear and multi-linear cryptanalysis that will put 
the first bounds on the brutal countermeasure for DES and AES. Finally we give the cur- 
rent results given by MLPA attacks on somme simulations and on some real consumption 
traces (the DPA contest traces found in jhttp : / / www . dpacontest . org/ ) . 

2 Preliminaries on embedded symmetric ciphers and Power 
Analysis attacks 

In this section is first discussed the symmetric cipher design model on which our study 
has been done and then the way Power Analysis attacks can be applied to those designs. 

2.1 Embedded symmetric cipher design model 

Our study restrict itself to smartcards and FPGA devices that are meant to bore a 
symmetric cipher implementation. As it is now commonly accepted that hardware imple- 
mentations of symmetric (as well as asymmetric) ciphers achieve at the same time better 
performances and better security, the development of such devices has tremendously in- 
creased in the last few years. Symmetric cipher hardware implementations can take lots 
of forms considering the synchronous vs asynchronous designs, the pipelined versions, the 
implementations designed for restricted areas, consumption and/or high throughput. For 
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reasons of clarity, we will describe the studied designs using the common shape of sym- 
metric ciphers : Substitution-Permutation Network (SPN) composed in rounds (the key 
schedule won't be taken in account for our study, we only suppose the round keys to be 
available when needed). A symmetric cipher can be represented as on Figure [1] (note that 
the sub-blocks within a round can be ordered more or less differently). 
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Figure 1: Schematic of a Symmetric Cipher 



The Permutation part of the cipher, as well as the add round key part are linear func- 
tions that can be very efficiently implemented in hardware with simple combinational logic. 
However, the substitution part is usually made of SBoxes, that are highly non-linear func- 
tions on 4, 6 (DES) or 8 (AES) bits and are not so easy to implement in combinational 
logic. As a matter of fact, in many designs the SBoxes are stored as lookup tables in 
memory (RAM or ROM) and accessed when needed in order to save critical logic space. 
Hence, one way to implement one round of the symmetric cipher is to split it in three clock 
cycles, the first one dedicated to the add round key function, the second one for the lookup 
tables of SBoxes to be accessed and the last one for the diffusion function. Of course each 
of them can be split again in several clock cycles if needed (In AES for instance, there can 
be 8 RAM accesses to the same SBox in one round or just one RAM access if the SBox is 
duplicated in RAM). 

Furthermore, when the throughput is more critical than space, it is usually pretty easy to 
pipeline the executions, in that case it is then mandatory to implement each round instead 
of just one round and a loop counter. 

To our knowledge Power Analysis attacks on smartcard (ASIC) and FPGA are done on 
such implementations specifications and they will be the base of our study of PA attacks 
and countermeasure. The knowledge of this high level design (what is computed during 
each clock cycle) is considered to be known by the attacker, as some probing techniques 
would give him this information anyway. 
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2.2 Power Analysis Attacks 

Power analysis attack is a dynamic and involved source of research as the development 
of resistant cryptographic hardware devices is needed. The study of PA attacks and their 
countermeasure has taken a prodigious takeoff since the introduction of the very efficient 
DPA attacks in 1998. 

Power consumption in CMOS circuits Without going into the depths of CMOS gates 
power consumption (a simple, yet enough for our need, presentation can be found in [T7] 
pages 27-60) what we would like to point out here is that the power consumption of CMOS 
circuits is dependent on the data manipulated as transitions from to 1 and 1 to con- 
sume significantly more power than to or 1 to 1 transitions through a logical gate. 
An attacker observing the overall consumption of a CMOS circuit during two different 
execution can tell, at a chosen point in time, which execution has led to a greater number 
of data changes. What is remarkable to note though is the fact that power consumption 
of combinational logic (in ASIC or FPGA) at a point within a clock cycle won't give the 
attacker relevant information on the data since one usually assume that the attacker has 
not a precise enough knowledge of the netlist to be able to predict the glitches occurring 
throughout the logic circuit (see [17] pages 39-40). Considering this, the power analysis 
are based on the study of registers and buses power consumption since theirs data transi- 
tions are synchronized with the clock fronts and don't involve combinational logic. To our 
knowledge all PA attacks are based on this principle. 

Hamming distance and Hamming weight models When considering the consumption 
of a bus or register, since the consumption power is significantly higher when a bit value 
change, the Hamming distance model (HD) says that the power consumption is closely 
related to the Hamming weight of the difference (bit-width Xor) of two successive data 
values. Note that, of course, absolute values of the measured power traces are not of any 
use for the attacker, but relative values with respect to other measurement are relevant. 
A more simple model, the Hamming weight model (HW), approximate the power consump- 
tion directly by the Hamming weight of the manipulated data value. 

Other models exists, they are basically variants of those models based on some knowledge 
the attacker might have on the targeted hardware design (see [T7] pages 38-43). 

2.2.1 SPA, DPA, HO-DPA 

SPA, DPA and HO-DPA attacks are semi-invasive passive attacks introduced in [13] by 
Paul Kocher, Joshua Ja, and Benjamin Jun in 1998. Their semi-invasiveness and passive- 
ness make them easy to setup, i.e. no need for a complete knowledge of the implementation, 
timing analysis, and so on. Let us give a rough description of these attacks and introduce 
some useful notations. 

Simple Power Analysis SPA is the simplest way to use Power analysis in order to attack 
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a cryptographic implementation. It requires interpreting the power consumption trace of 
the cryptographic function execution. According to [13], SPA can be used to break crypto- 
graphic implementations in which the execution path depends on the data being processed 
(e.g. conditional branching, comparisons, multipliers, exponentiators, etc. ...). Further- 
more the authors consider the prevention of SPA to be fairly simple. 

Differential Power Analysis The efficiency of DPA attacks comes from the fact that 
instead of studying directly the power consumption over the execution time, it focuses 
to data-related instructions. By statistical means, DPA allows the attacker to suppress 
the measurement noises and bring to light data-dependent operations. Let us borrow the 
notations of [13] here : 

• Ti[j] : The j*'^ sample of Tj, the i*^ recorded power trace. 

• D(P,B,Ks) : DPA selection function, computes B (Hamming weight of intermedi- 
ates bits at a fixed point of time), as a function of a secret key block Ks and the 
plaintext P (could also be the ciphertext C). In the original DPA from [13] on DES, 
B is the Hamming weight of one intermediate bit (i.e. the value of one bit). For now 
let assume the value of i? is or 1. 

After observing m executions of the cryptographic primitive, recording each power trace 
Ti...„i[l • • • k] {k samples) and the corresponding plaintexts Pi-.-m (respectively cipher- 
texts Ci...m), the attacker computes the value of {Bi\i...jn using the selection function 
D{Pi, Bi, Kg) (for an arbitrary fixed Kg). The traces are divided in two sets Sq and Si, 
such that Tj € 5*0 iff i?j = 0, Tj G Si otherwise and the differential trace over the k samples 
is computed : 

^^ ^^^ ^" Er=i(i-^o 

If Kg was a wrong guess, then the values {Bi}i...m are not related to the manipulated data 
and then, when the number of tests increases (m oo), the differential trace tends to a 
flat trace (Vj = 1 ■ ■ ■ k, Aj^lJ] — > 0). On an other hand, if Kg was a right guess, the value 
{Bi}i...m are correct and the differential trace is related to the power consumption that 
coincide with the value of B. Furthermore, the value of other bits, the measurement noises, 
being not considered by D, will less affect the differential trace as the number of tests in- 
creases. Hence, the differential trace will bore spikes on samples where the manipulated 
data is correlated with D when m increases. 

Remarks Other methods have been developed to evaluate more or less precisely correla- 
tions between the power consumption traces and selections functions, the interested reader 
can refer to the work of E. Brier, C. Clavier and F. Olivier in [6] that uses the Pearson 
coefficient (CPA) and the maximum likelihood method of R. Bevan and E. Knudsen [3]. 
Moreover, when our description details single-bit DPA (B represent a single bit), more 
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complicated selection function can be used where B can take more than two values (Ham- 
ming weight of an intermediate data value), those kind of attacks (DPA multi-bits) have 
been gathered under the name Partitioning Power Analysis (PPA) by Thanh-Ha Le, Jessy 
Clediere, Cecile Canovas, Bruno Robisson, Christine Serviere et Jean-Louis Lacoume in 

m 

High-order DPA In a n-order DPA, a combination of n points in the data path is in- 
volved in the selection function, i.e. for each power trace, n samples will be differentiated 
in the same differential trace A/3(l • • • n). 

2.2.2 Countermeasures 

As introduced in the section [H the unique masking techniques uses random data for 
every encryption function call in order to randomize the power consumption. Hence, the 
additive masking consists in manipulating data that have been xored with a random value 
(the mask) and follow the mask value throughout the cipher execution such that it can 
removed when needed (at the end of a round, a set of rounds or even at the end of cipher 
execution) . Even though following the additive mask value is pretty easy when considering 
linear functions, it show itself tricky when considering highly non-linear function such as 
SBoxes. Hence, the proposed masking techniques [21 [3l [1], uses generations of custom 
SBoxes related to the current masks such that the custom SBoxes make it possible to 
easily follow the mask values. The custom SBoxes are then stored in RAM, the original 
versions of the SBoxes can be stored in RAM or ROM. In |16j . the authors proved that 
three 32-bit random masks and six custom SBoxes are the minimal cost for a secure DES 
implementation masking all the outputs of the SBoxes of the sixteen rounds. 

3 A brutal counter measure 

As has been detailed in the previous section, the power analysis attacks are based on the 
study of registers or buses power consumptions, as the transitions from one data value to 
another inside them are done at a precise time of the clock cycle and then allows to pre- 
cisely determine the consumption of such a transition. This consumption being assumed 
to be closely related to the Hamming weight of the manipulated data (straightforwardly in 
the HW model or on the difference of two successive data in the HD model). DPA attacks 
work assuming the attacker can predict the value of one bit (or of a set bits) actually 
manipulated by a register or bus as a function of the known input (or output) bits of the 
cryptographic primitive and few key bits. In practice there should not be more than 32 
key bits involved O [H [16] otherwise the attack couldn't be achieved considering the cost 
in memory and acquisition time. 

From the above considerations, a straightforward way to disable such Power Analysis at- 
tacks is to suppress the use of registers and buses until every bit stored in registers or going 
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through the buses are either independent on the secret key or dependent on more than 32 
bits of the secret key (i.e. before a certain number of rounds). 

3.1 countermeasure setup and drawbacks 

Depending on the target symmetric cipher's diffusion functions, one can fix the number 
of rounds that must be executed during one clock cycle (i.e. between two registers or two 
access to a bus). Let us consider the two most popular symmetric ciphers : The Data 
Encryption Standard (DES) and its successor the Advanced Encryption Standard (AES). 
The brutal countermeasure for DES would be to compute the first three rounds by pure 
combinational logic in one clock cycle and, by symmetry, the same thing should be done 
for the last three rounds. For AES, since its diffusion function is more efficient, the first 
round should be done in one clock cycle as well as the two last rounds (since the last round 
of AES does not contain the diffusion MixColumn). Let us call these incompressible blocks 
the "glued blocks". 

The obvious drawback of this countermeasure is that it makes it mandatory to implement 
the SBoxes in combinational logic (using LUT implementation for instance). Furthermore, 
on a pipelined implementation, it would limit the overall throughput (since it forbids to 
divide the first and last blocks of logic in several clock cycles). 

The advantages of the countermeasure being its very simplicity to implement (no need for 
additional functions) and the fact that it does not base itself on a secure pre-computation. 
It seems important to note here that this countermeasure is not compatible with the unique 
masking methods since those methods, as seen in section 12.2.21 need to generate mask- 
dependent SBoxes at runtime. 

Drawback bypass In some cases it is possible to go around the pipeline drawback. When 
the area is not critical, it is possible to put several glued blocks in parallel monitored by a 
slower clock (generated by a pll component for instance) and connect them to the original 
rounds implementation that runs at a faster clock cycle. This solution would keep a high 
throughput even with the countermeasure. 

Let us also note that the AES SBox have a very efficient implementation in terms speed 
and area using the multiplicative inverse function in GF(2^) ([11]). 

Finally this countermeasure may be attractive to designers that have a large combi- 
national logic space and give priority to strong security, even though the cost in area is 
outrageous. 

4 (M)LPA Attack description and complexity 

In this section is introduced Linear Power Analysis and Multy-Linear Power analysis 
attacks. Those attacks correspond strictly to Linear ([ISj) and Multy-Linear cryptanalysis 
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(|12j) in the side-channel world. We are first going to introduce some useful notations for 
the study of linear approximations. Then we will introduce the idea of LPA and MLPA 
before describing the attacks algorithms and complexity. Finally we will discuss its practical 
setup. 

4.1 Linear approximations of a symmetric cipher 

Linear cryptanalysis has been introduced by Matsui in 1993 ([18j). since then it has 
become one of the most important base of the study of block cipher security. Nowadays 
new block ciphers must prove some inherent resistance against linear cryptanalysis. Let us 
remark that many cryptanalysis methods are based on this fundamental discovery, among 
others, the multi-linear cryptanalysis \12\ [5] will be particularly interesting here. 

linear cryptanalysis A linear approximation is defined as a combinations of ciphertext 
bits linear function of plaintext and key bits. 

Let us denote \K\, \P\, \C\ respectively the bit-lengths of key, plaintext and ciphertext. 
Let us consider a vector 11 of length |P|, k of length \K\ and F of length |C| and a bit b. 
n, K, F and b define a linear approximation of bias e over the symmetric cipher if and only 
if : 

Pt {< P,U > e < K, K > eh =< C{P, K),T >) > 1/2 + e (1) 

Given such a linear equation, Matsui showed that a high probability of success to recover 
the involved key bits in the equation using linear cryptanalysis would require a data- 
complexity (i.e. number of plaintext-ciphertext pairs) of = 

Multi-linear cryptanalysis It was shown in [5] that instead of using a single linear 
approximation, the use of several linear approximations involving the same key bits would 
significantly improve the performances of the attack. As a matter of fact, given n linearly 
independent approximations of respective bias ej,j = 1, • • • ,n the data-complexity of the 
attack would be reduced to 

n 

In a very recent — yet to be published — paper, Tavernier et al. (|15j. studied the problem 
of finding all the linear approximations with a given bias of a given Boolean function. The 
authors showed the equivalence between the problem of finding linear approximations for 
a fixed output mask (F fixed) and a list decoding problem in the first order Reed-Muller 
code. They were then able to find good linear approximations up to 8 rounds of DES and 
thus, based on results of [8], break a reduced version of the cipher with low data-complexity 
(2^^ plaintext-ciphertext pairs). 
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4.2 Introduction to (M)LPA 

As mentioned above, (M)LPA implies the use of linear approximations to attack a sym- 
metric cipher hardware implementation by power analysis. We will introduce two different 
ways to use linear approximations by an attacker, the later will be the so called (M)LPA 
attack. Let us denote H(u) the Hamming weight function of a vector of bits u. 

A first approach : a classical approach A very straightforward approach would be to 
attack by DPA, CPA or PPA using a linear approximation as base of the selection function. 
This will render the attack's selection function dependent on the approximation bias e and 
thus increase the data complexity. The advantage of such an attack will be to find linear 
approximation that involve few bits of the key (less than 32 in practice) when evaluating 
data values in registers or going through buses that are strictly dependent on more than 
32 key bits from the point of view of the cipher function. Hence it would allow to attack a 
cipher implementation where the unique masking technique or the brutal countermeasure 
are used only for the data bits that dependent on less than 32 key bits. 
For instance let us consider the mono-bit DPA attack presented by Kocher in [13]. Using 
the notations introduced in section 12.2.11 let us denote by m the complexity of the attack 
if the selection function {D{P,b, K)) is not probabilistic (classic DPA) and M the one 
when the selection function {D^{P, 5, A')) is probabilistic (meaning that has probability 
1/2 + e to be right). The /c-sample differential trace A£)[l • ■ ■ /c] is then : 



M 



It is easy to see that when the key guess is wrong, the probabilistic section function is not 
correlated to the manipulated data (as the old selection function) and the differential trace 
will tend to a flat trace when when M — > cxd. Let us consider now that the key guess is 
right. Since is right with a probability p = 1/2 + e, let us denote Dime the cases where 
the selection function is right and D false otherwise. Then, after re-indexing the plaintexts 
and traces, we have 

E"(l/2 + .)A/ + l Dfals.{Pr,b,K)TAi] _ E^jl T,[i] 



2 ( Efiy Dtrue{P.,b,K)TA3\ E^i2.M + lg(^»Ai^m[j] _ Ej^IM ^ 

V i:tiD.(Puh,K) ^ j:fLD,iP,,b,K) M J 



where D is an uncorrelated selection function (it has 1 chance over 2 to be wrong) and 
then will tend to a flat trace when M — > oo. Finally, the data complexity of the attack 
is such that 2eM > m, in other words, the complexity of the attack increase by a factor 
l/(2e) as the selection function has a bias e. 
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Remark 1 Let us note here that the term X^j^^ D^{Pi, b, K) ~ M/2 in the above equation 
will crush the potential spikes amplitude and in practice, e shouldn't have to be very small 
for a data-complexity to be unreachable in practice. The measurement acquisition time 
cannot be neglected in Power Analysis attacks. 

Remark 2 The attack described above can be easily extended to multi-linear approxima- 
tion attack. 

Second approach : a HD and HW models approach An interesting way to use 
linear approximations would be to directly approximate the Hamming weight of a register 
since this is the quantity which is the most correlated to what is being measured. Thanks 
to the work of Tavernier et al. (in |15j). it is possible to find linear approximations of 
< H{C{P, K)),rH > with any chosen vector Th (Th is a vector of length log2{\C\), with 
respect to the notations of section HT]) . 

If we assume that the actual value of the measurement samples Ti[j] is closely related 
to the value of the hamming weight of the data manipulated (for the HW model) or the 
difference between two successive data manipulated (for the HD model), then the use of 
linear approximations on the hamming weight value of a register (or a bus) would lead to 
very efficient attacks (a discussion on this assumption is given in the later section r4.3.2p . 
This important remark is the origin of the new MLPA attacks that should prove themselves 
much more dangerous than the previous DPA-like approach. 

4.3 The MLPA attack 

As introduced in the previous section, the LPA attack is based on the HW and HD 
models. If we assume that these models are relevant, then multi-linear approximations can 
be used in all their strength. As presented in [8l [15] in the context of classical multi-linear 
cryptanalysis, one can consider the recovering of some key-bits as the decoding problem of a 
code whose length is equal to the number of available linear relations and over a memoryless 
channel whose capacity depends on the respective biases of the linear approximations. Let 
us consider a set of n linear relations of biases ei,l = 1, ■ ■ ■ , n with a form as follow : 

< P, H; > e < H{C{P, K)),Th, > (Bk =< K, Ki > (2) 

where the set of vectors , / = 1 • • • n are such that a limited number k of key bits are 
involved in the equations (in practice less than 32 bits) and form a matrix of rank k 
The idea is to reconstruct a code word y of length 2'^ from a noisy and erased codeword y 
wich is enough close to y, to be able to decode it in the first Reed-Muller code. 

4.3.1 Attack algorithm 

After observing N encryptions and selecting the sample j in each traces Ti,i = 1 . . . N 
where the target intermediate data bits are manipulated, the attack will proceed as follows 
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1. For each hnear approximation and each "plaintext-r/-[j]" pair (for the HD model 
it would be, for each "plaintext pair-rj[j] pair") compute the predicted value of 

< K,Ki >i using the right member of the equation [2] (which would be "< Pi, Hi > 
© < Ti[j],rHi > (Bbi" since Ti[j] is considered as corresponding to H{C{Pi,K))). 

2. For each linear approximation, separate the traces into two sets Sq and S\ for which 

< K, Ki > has been evaluated to and 1 respectively. 

3. Construct the noisy and erased codeword y such that the value of y at position xi = ki 
{ki is seen here as its value in GF(2'=)) is y{xi) = (#{5^} - #{S{})lnQ^^). The 
position were no linear approximation is defined will be put to zero thus considering 
it as an erasure position. 

4. Decode y in the first order Reed-Muller code, i.e. the most probable codeword y is 
the one that maximise the inner product 'Yl,x&{oiYi~^)^^^^yi-^)- "^^^ Fast Fourrier 
Transform would do the trick in a time complexity 0{k2^) and data complexity 
0{2^). 

For details of Reed-Muller decoding efficiency in a gaussian and erasure channel, the inter- 
ested reader should refer to the results of I. Dumer-R. Krichevskiy in [7j. 

4.3.2 Practical setup 

The attack presented above may seem completely unrealistic since it uses directly the 
value measured as Hamming weight of the data manipulated, which contradict subsequently 
the remark done in section [2?2] on the use of absolute measurement values. Two practical 
setup seem possible to bypass this : 

• First of all, let us assume that the targeted device can be run with chosen plaintexts. 
Under this hypothesis it is possible to attack by re-initializing the registers before 
each encryption (reseting the register would be to run a set of fixed plaintexts until 
the device is in the same state before each encryption). Therefore, using simple pre- 
testing on the board, it would be possible to relate the consumption traces to the 
targeted quantities as following a Gaussian law. 

• For a more practical attack, assuming that we have access to a twin device where 
we can put arbitrary chosen keys, it would be possible to run the algorithm that 
search linear approximations directly on the twin device as a pre-processing phase 
of our attack. As the algorithm is run on a Boolean function as a black box, using 
the consumption measurement as output value of our Boolean function might render 
the attack even more efficient than in the model presented above. Further more, it is 
then possible to mount unknown cipher attacks since no knowledge of the symmetric 
cipher is needed except for its SPN structure (the hardware device is seen as a black 
box from which the consumption leakage are the outputs). 
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4.3.3 Results 



In this section are presented the results obtained using the above described attacks on 
the DES and AES cipher. There are two sets of results, the first ones are called simulations 
and can be seen as the validation of our attack in theoretical model. The second set of 
experiment have been done on real power traces, and validate the practical feasibility of 
the attack. Table [T] and Table [2] summarize some of the results, in these tables, linear 
equ." refers to the total number of linear approximations found for the attack, not all of 
them have been useful, "7^ Plaintext" or Traces" refers to the data complexity of the 
attack and "Pr(Success)" refers to the probability of success of the attack in simulation. 

Attack simulations The algorithm descried in section 14.3.11 has been simulated on the 
DES and AES cipher. By the means of Tavernier et al.'s work on finding linear approxima- 
tions, up to three rounds can be approximated with good enough biases for the hamming 
weight of an intermediate data value. Hence the figures of Table [U summarize our results 
(with respect to HW and HD model). They show that a glued block of three rounds for a 
DES version of the brutal countermeasure wouldn't be enough. The simulation has been 
done considering that the cipher implementation leakage gives the hamming weight of the 
targeted data. Hence, in the HW model, the linear approximations evaluate the hamming 
weight of the round register (assuming that their is a register after a glued block of 1, 2 
or 3 rounds), in the HD model, the linear approximations evaluate the hamming weight of 
the differences of the round register between two execution (two different plaintexts). Let 
us note here that in a chosen plaintext attack, the HW model results correspond to an HD 
model. 
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Table 1: Simulation Results 



It is important to note here that no linear approximation have been found for the first 
round in HD model, as if no information would leak from the hamming weight of the data 
manipulated. The attack on AES has been done on the last round since it does not contains 
the MixColumn diffusion function. 

Attack on DPA-contest traces Thanks to the DPA contest, power consumptions traces 
are freely available. Unable to obtain and setup a hardware device ourselves, these online 
available traces allowed us to try our attack on real power traces and then prove the 
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feasibility in a real setup of the attack. The attack has been launched on the contest 
traces (secmatvl_2006_04_0809) that yield about 80000 power consumption traces. The 
linear approximations evaluate the hamming weight of the difference of data stored in the 
implementation register {LR) (see [10] for more details on the DES implementation), the 
attack description and setup can be found in Annexe of this document. 



Cipher 


rounds 


# linear equ. 


# key bits 


# traces 


DES 


1 


84 


~20 


1000 


DES 


1 


84 


45 


20000 


DES 


2 


163 


-10 


1000 


DES 


2 


163 


47 


36000 



Table 2: Attack on DPA-contest traces Results 



5 Conclusion and future work 

The results shown in section 14.3.31 prove the feasibility of the MLPA attacks, it is our 
belief that this set of attacks is a starting point of new results on power analysis attacks 
on embedded symmetric ciphers. Hence the next steps will be of two kinds : 

• The research of better linear approximations in term of bias and which can approx- 
imate more rounds of the symmetric cipher. This implies a complexity in time that 
we did not have for the redaction of this document. 

• The experimentation on an unknown cipher implementation with research of linear 
approximation directly on the board. This attack may lead to very efficient attacks 
since it directly approximate the leakage function without using any consumption 
model. 
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Annexe : The attack on DPA-context traces setup 

This annexe describe an MLPA attack on power traces found on the dpa-contest website : 



http : //www . dpacontest . org/ The traces used for our attack are stored under the name 
: secinatvl_2006_04_0809, there is 81089 power traces that have been measured from a 
straightforward DES implementation detailed in [lOj . 

The implementation is described in the figure [2] (from [10]). Let us denote H[X) the 
Hamming weight function, IP{X) the initial permutation of DES cipher and DESn{X, K) 
the first n rounds of the DES encryption on a 64-bits vector X and a (n x 64)-bits 
K. The power measurement samples we are interested in are the ones corresponding 
to the load of the register LR, after round 1 and 2. According to the Hamming Dis- 
tance model, they should correspond to H{IP{X) XOR DESi{X,K)) (noted Ci{X,K)) 
and H{DESi{X,K) XOR DES2{X,K)) (noted C2{X,K)) respectively. The sample in- 




Figure 2: Schematic of DES implementation 
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dexes were found by just simulating a DPA attack on the first round and on the second 
round (using the first round key). It is our believe that these informations could have been 
found by an attacker using simple timing measurement, anyways it is a hypothesis of the 
MLPA attack that these informations are known. Hence, the load of register LR after the 
first round (respectively the second round) was found to be corresponding to the 5749th 
(respectively the 6374th) sample of the power traces. 

Linear approximations have been generated corresponding to < Ci{P, K),Th >, i € 
{1, 2}. Only the ones where Th equals to 0x10 or 0x20 were kept. The Table[3]give an exam- 
ple of 11 of these approximations for the second round (C2). Over these 11 equation, only 6 
key bits are involved {K[j] is the jth bit of the secret key). The last thing we now have to do 
in order to apply the MLPA algorithm is a way to tell the value < Ci{P, K),Th >, i G {Ij 2} 
from the consumption measurement at the selected sample. That is why, to simplify this 
attack, we only select the output mask (Th) to be 0x10 or 0x20 because then, we just have 
to separate the traces in two, the ones that have power measures greater than the average 
power measure Si and the others Sq, assuming that the power traces in Si are such that 
< Ci{P,K), 0x20 >= 1 and < Ci{P,K), 0x20 >= for the others. We then assume that 
the power traces in Si are such that < Ci{P, K), 0x10 >= and < Cj(P, A'), 0x10 >= 1 
for the others since there is very few chance to have Cj(P, K) < 0x10 or Ci{P, K) > 0x30 
from random plaintexts. 





Bias 


Equation 


0x10 


0.0219 


- 


H P[5] + 


P[26]4 


- P[27] + 


P[31] + 


P[45] + 


P[53]+ P[61]+ K[6]+ K[7]+ K[29]+ K[38]+ K[52] 




0x20 


0.0215 


1 - 


y p[5]+ 


P[26H 


- P[27] + 


P[31] + 


P[45] + 


P[53]+ P[61]+ K[6]+ K[7]+ K[29]+ K[38]+ K[52] 




0x20 


0.0134 


- 


h P[28] 


f P[29] 


+ P[31] 


f P[37] 


f P[45] 


f P[53]+ K[6]+ K[7]+ K[291+ K[61] 




0x20 


0.0156 


1 - 


y p[5]+ 


P[28H 


- P[29] + 


P[31] + 


P[37] + 


P[45]+ K[6]+ K[29i+ K[381+ K[61] 




0x10 


0.0142 


- 


y p[5]+ 


P[28H 


- P[29] + 


P[31] + 


P[37] + 


P[45]+ K[6]+ K[29]+ K[381+ K[61] 




0x20 


0.0189 


1 - 


y p[5]+ 


P[28H 


- P[29] + 


P[31] + 


P[37] + 


P[53]+ K[7]+ K[29]+ K[381+ K[61] 




0x10 


0.0189 


- 


y p[5]+ 


P[28H 


- P[29] + 


P[31] + 


P[37] + 


P[53]+ K[7]+ K[29]+ K[381+ K[61] 




0x10 


0.0126 


1 - 


y p[26] 


¥ P[27] 


+ P[37] 


f P[45] 


f P[53] 


f P[61]+ K[6]+ K[7]+ K[521+ K[61] 




0x20 


0.0163 


- 


y p[5]+ 


P[8] + 


P[9]+ P[37]+ P[45]+ P[53]+ P[61]+ K[6]+ K[7]+ K[38]+ K[52]+ K[61] 




0x10 


0.0167 


1 - 


y p[5]+ 


P[8] + 


P[9]+ P[37]+ P[45]+ P[53]+ P[61]+ K[6]+ K[7]+ K[38]+ K[52]+ K[61] 




0x10 


0.0215 


1 - 


y p[5]+ 


P[14H 


- P[15] + 


P[31] + 


P[37] + 


P[45]+ P[61]+ K[6]+ K[291+ K[38]+ K[52]+ K[61] 




0x10 


0.0146 


- 


y p[5]+ 


P[28H 


- P[29] + 


P[31] + 


P[37] + 


P[45]+ P[61]+ K[6]+ K[29]+ K[38]+ K[52]+ K[61] 




0x10 


0.0148 


1 - 


y p[5]+ 


P[8] + 


P[9]+ P[31]+ P[37]+ P[45]+ P[61]+ K[6]+ K[29]+ K[38]+ K[52]+ K[61] 




0x20 


0.0223 


- 


y p[5]+ 


P[14H 


- P[15] + 


P[31] + 


P[37] + 


P[45]+ P[61]+ K[6]+ K[29]+ K[38]+ K[52]+ K[61] 




0x20 


0.0182 


- 


y p[5]+ 


P[28H 


- P[29] + 


P[31] + 


P[37] + 


P[53]+ P[61]+ K[7]+ K[291+ K[38]+ K[52]+ K[61] 




0x10 


0.0152 


- 


y p[5]+ 


P[26H 


- P[27] + 


P[31] + 


P[37] + 


P[53]+ P[61]+ K[7]+ K[29]+ K[38]+ K[52]+ K[61] 




0x10 


0.0187 


1 - 


y p[5]+ 


P[2SH 


- P[29] + 


P[31] + 


P[37] + 


P[53]+ P[61]+ K[7]+ K[291+ K[38]+ K[52]+ K[61] 




0x20 


0.0157 


1 - 


y p[5]+ 


P[26H 


- P[27] + 


P[31] + 


P[37] + 


P[53]+ P[61]+ K[7]+ K[291+ K[38]+ K[52]+ K[61] 




0x20 


0.0191 


- 


y p[5]+ 


P[26H 


- P[27] + 


P[31] + 


P[37] + 


P[45]+ P[53]+ P[61]+ K[6]+ K[7]+ K[29]+ K[38] + 


K[52]+ K[61] 


0x10 


0.0183 


1 - 


y p[5]+ 


P[26H 


- P[27] + 


P[31] + 


P[37] + 


P[45]+ P[53]+ P[61]+ K[6]+ K[7]+ K[29]+ K[38] + 


K[52]+ K[61] 



Table 3: Attack on DPA-contest traces Results 



With this setup, and only considering these 11 equations, the 6 keys bits are retrieved 
from the first 2000 traces. 
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